Crowd & Prejudice: An Impossibility Theorem for Crowd Labelling without a Gold Standard
نویسندگان
چکیده
A common use of crowd sourcing is to obtain labels for a dataset. Several algorithms have been proposed to identify uninformative members of the crowd so that their labels can be disregarded and the cost of paying them avoided. One common motivation of these algorithms is to try and do without any initial set of trusted labeled data. We analyse this class of algorithms as mechanisms in a game-theoretic setting to understand the incentives they create for workers. We find an impossibility result that without any ground truth, and when workers have access to commonly shared ’prejudices’ upon which they agree but are not informative of true labels, there is always equilibria where all agents report the prejudice. A small amount amount of gold standard data is found to be sufficient to rule out these equilibria. INTRODUCTION For “the crowd” is untruth.– Kierkegaard Precedent literature has proposed a large number of algorithms that take a set of data points labeled by a group of agents, and try to estimate both the reliability of agents. These algorithms can be divided into two sets: those that leverage a small amount of gold standard (ground truth) data (Snow, O’Connor, Jurafsky & Ng 2008, Wauthier & Jordan 2011), and those that do not (Dekel & Shamir 2009b, Raykar, Yu, Zhao, Jerebko, Florin, Hermosillo Valadez, Bogoni & May 2009, Raykar, Yu, Zhao, Valadez, Florin, Bogoni & Moy 2010, Kumar & Lease 2011, Dekel & Shamir 2009a, Yan, Rosales, Fung, Schmidt, Hermosillo, Bogoni, Mouy & Dy 2010). These algorithms that attempt to do without the need for gold standard, do so by using agreement among different labellers as indicative of correctness of a label. This agreement is either at the level of how to label of a given datapoint, as in most cases; or in how features map to labels, as in (Dekel & Shamir 2009b). To achieve this they place their trust on agents who provide labels that are consistent with the labels provided by other agents, or in the case where the same datapoint is not labeled twice, where the proposed feature to label mapping is consistent with other agents’ mapping of features to labels. It is often the case that labellers want to be seen as informed by those who are collecting the labels, as the labelling tasks soften pay and it is natural for those collecting the data to avoid the unnecessary cost of paying for labels from uninformed labellers. We analyse the class of algorithms that do not use gold standard data as mechanisms in a game-theoretic setting, in order to understand the incentives they create for the agent providing the labels. We first present an impossibility result: that without gold label data, and when workers have access to commonly shared ’prejudices’ upon which they agree but which are not informative of true labels, then there is always equilibria where the mechanism does not obtain the true labels from the informed workers, but rather all workers report the prejudice. We then consider how a small amount of gold data is generally sufficient to render situations where the prejudice is reported by informed players as outside the equilibrium set. One possible criticism of our work is that there is little interest in pointing out that when the assumptions of a statistical model (in this case, that agreement among labellers indicates correctness) do not hold the conclusions drawn from such a model can be misleading. Our argument, however, is more subtle than this: the incentives created by the natural applications of the model in its intended task undermine the very assumptions of the model, by creating incentives for players to agree on the labelling with others, irrespective of whether they believe these to be the true labels. To make the situation we have in mind more concrete and to clarify how it defers from standard information cascades studied in economics, consider the hypothetical example of a professor who assigns their teaching assistants to grade exams, without grading any themselves. The TAs may or may not know the topic at hand (be informed or uninformed) and they must provide a grade (label) to each exam they are assigned. If the TAs each grade a question on each exam and do so sequentially, so that for each previous answer in a given exam they can observe the grades other TAs have assigned to them, what in economics is referred to as an information cascade can occur. TAs grading later questions can look at the grade a student received for initial exam questions, and guess that the question they where assigned will receive a similar grade, instead of having to understand the answer to the question the student gave and how it relates to the correct answer. In contrast, we study a related but different situation, analogous to one, in which each TA grades (possibly overlapping) full exams, and they do so simultaneously without access to what the others are assigning. Note that if the TAs expect to be rewarded for agreement with others and if they believe others may use some prejudice to grade the exam, such PROCEEDINGS, CI 2012 as assigning higher grades to, say, students with neat hand writing or who use longer words, they might be motivated to also used said prejudice. An equivalent example can be considered in a crowd sourcing context. Suppose we ask for translations of a given word in language A to speakers of language B; a common prejudice for speakers of language B would be that if a word in language A sounds like the word in language B it must translate to that word. Even if bilingual speakers of A and B are present in the worker pool, if they believe this, the consensus label will be the similar sounding (but possibly incorrect) translation they may choose to report this prejudice as to continue to be employed in the translation task.
منابع مشابه
Focus Annotation of Task-based Data: Establishing the Quality of Crowd Annotation
We explore the annotation of information structure in German and compare the quality of expert annotation with crowdsourced annotation taking into account the cost of reaching crowd consensus. Concretely, we discuss a crowd-sourcing effort annotating focus in a task-based corpus of German containing reading comprehension questions and answers. Against the backdrop of a gold standard reference r...
متن کاملA Data-driven Method for Crowd Simulation using a Holonification Model
In this paper, we present a data-driven method for crowd simulation with holonification model. With this extra module, the accuracy of simulation will increase and it generates more realistic behaviors of agents. First, we show how to use the concept of holon in crowd simulation and how effective it is. For this reason, we use simple rules for holonification. Using real-world data, we model the...
متن کاملExposing ambiguities in a relation-extraction gold standard with crowdsourcing
Semantic relation extraction is one of the frontiers of biomedical natural language processing research. Gold standards are key tools for advancing this research. It is challenging to generate these standards because of the high cost of expert time and the difficulty in establishing agreement between annotators. We implemented and evaluated a microtask crowdsourcing approach that can produce a ...
متن کاملEffects of Crowd, smoking, bacterial and fungal infections on COVID-19
The intuition that population density increases the propensity of an epidemic to spread in cities is correct in the sense that increased density likely leads to an increase in the contact rate of an individual, which makes the reproduction number larger and leads to larger infectious disease outbreaks in dense areas. A wealth of research already suggests that smoking suppresses immune function ...
متن کاملDigital Art and Crowd Creation in Iran (Case Study: Tehran Annual Digital Art Exhibition)
This paper aims to show the status of digital art in Iran and explain how the meaning of an artist has transformed in the digital age. The primary assumption of this paper is that the experience of digital art has again revived the collective experience in creating arts. Although, interactivity is considered to be the most important quality of digital art, their collective, collaborative and pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1204.3511 شماره
صفحات -
تاریخ انتشار 2012